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Abstract. Unlike some other formal systems, the proof system Meta¬ 
math has no built-in concept of “decimal number” in the sense that arbi¬ 
trary digit strings are not recognized by the system without prior defini¬ 
tion. We present a system of theorems and definitions and an algorithm 
to apply these as basic operations to perform arithmetic calculations 
with a number of steps proportional to an arbitrary-precision arithmetic 
calculation. We consider as case study the formal proof of Bertrand’s 
postulate, which required the calculation of many small primes. Using a 
Mathematica implementation, we were able to complete the first formal 
proof in Metamath using numbers larger than 10. Applications to the 
mechanization of Metamath proofs are discussed, and a heuristic argu¬ 
ment for the feasability of large proofs such as Tom Hales’ proof of the 
Kepler conjecture is presented. 
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1 Introduction 

The Metamath system, consisting of a formal proof language and computer 
verification software, was developed for the purpose of formalizing mathematics 
in a foundational theory which is as minimal as possible while still being able 
to express proofs “efficiently” (in a sense we will make more precise later) [T|. 
Although Metamath supports arbitrary axiom systems, we are mainly concerned 
in this paper with the set .mm database, which formalizes much of the traditional 
mathematics curriculum into a ZFC-based axiomatization [2]- 

In this context, one can define a model of the real numbers and show that it 
satisfies the usual properties, and within this set we have the natural numbers 
IN and can give a name to the numbers 1,2,3,... € IN. One important point of 
comparison to other proof languages such as Mizar [3] is that not every sequence 
of decimal digits is interpreted as a natural number. In particular, the goal is to 
show rigorous derivations directly from ZFC axioms with complete transparency 
and not to depend on indirect proofs of correctness of, say, a computer arith¬ 
metic algorithm. A sequence of digits is treated as any other identifier and only 
represents a natural number if it has been defined to do so. The list of sequences 
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so defined is quite short - we have the definitions 

2 := 1 + 1, 3 := 2 + 1, 4:=3 + l, 5:=4+l, 6 := 5 + 1, 

7 := 6 + 1, 8 := 7 + 1, 9:=8 + l, 10 := 9 + 1, 

and this constitutes a complete listing of the defined integers in set.mm (not 
including 0 and 1, which are defined as part of the field operations). 

The primary application of the set.mm has been for abstract math, and 
such small numbers were sufficient for prior theorems, but initial attempts at 
Bertrand’s postulate showed that a new method was needed in order to system¬ 
atize arithmetic on large numbers. 


1.1 Bertrand’s Postulate 

Theorem 1 (Chebyshev, Erdos). For every n £ N there is a prime p satis¬ 
fying n < p < 2 n. 

This statement was conjectured by Joseph Bertrand in 1845, from which the 
problem gets its name, and it was proven in 1852 by Chebyshev. Our version of 
the theorem looks like this: 


b n € IN —>• £ P (n < p A p < 2n) 


bpof] 


Although the complete proof of this theorem is not the purpose of this paper, 
this was the motivating problem that led to the developments described here. 
For our formalization we targeted not Chebyshev’s proof but rather a simpler 
proof due to Paul Erdos. The proof is based on a detailed asymptotic analysis 
of the central binomial coefficients ( 2 r ( l ), and by keeping track of the bounds 
involved this shows Theorem Q] for n > 4000. A usual exposition of the proof will 
observe that the other 4000 cases can be shown by means of the sequence 


2,3,5,7,13,23,43,83,139,163,317,631,1259,2503,4001, 


(1) 


which can be seen to consist of primes pt such that pk+i < 2 pk. 

It is this last statement which is the most problematic for a formal system 
which cannot handle large numbers, because verifying that large numbers are 
prime requires even larger calculations, and doing such calculations by hand on 
numbers of the form 1 +1 + • • • + 1 (as in our definition of the numbers up to 10) 
is quite impractical, requiring 0(n 2 ) operations to multiply numbers of order n. 


The sans-serif labels mentioned in this paper refer to theorem statements in set. mm; 
they can be viewed at e.g. http://us.metamath.org/mpegif/bpos.html for bpos. 


l 




Arithmetic in Metamath, Case Study: Bertrand’s Postulate 


3 


1.2 Decimal Arithmetic 

To solve the problem of the space requirements of unary numbers, the standard 
approach is to use base-10 arithmetic, or more generally base-6 arithmetic for 
6 > 2,in which a nonnegative integer n £ INo is represented by an expression of 
the form 

k 

n = ^ affi = 6 • (6 • (6 • a*,) + • • • + a\) + a 0 . (2) 

i =o 

Here 0 < < 6, and arithmetic is performed relative to this representation, with 

“addition with carry” and long multiplication, using Horner’s method in order 
to efficiently recurse through the structure of the expression. (We postpone the 
precise description of these (grade school) algorithms to Section [2]) 


All Calculation is Addition, Multiplication and Ordering of INo- One 

important observation which is helpful to identify is that 

Observation 1. All calculations of a numerical nature can be reduced to the three 
operations x + y,x-y,x<y applied to nonnegative integers. 

For instance: 


V2 > 1.414 

because 

1414-1414 < 2 • 1000-1000, 

( 3 ) 

11 is prime 

because 

11 = 2-5 + 1 = 3-3 + 2, 

( 4 ) 

4^ 

'"N 

to 

4^ 




4< 4 

because 

4-4-4-2-3-4<5-6-7-8. 

( 5 ) 


Note that as this is a formal proof the goal is not in calculating the result 
itself, which we already know or can verify externally, but rather in efficiently 
verifying the numerical claim, which comes with its own challenges. For primality 
testing, this process is known as a primality certificate, and for larger primes we 
use so-called “Pratt certificates” for prime verification |4j. In general, there may 
be many non-numerical steps involved to set up a problem into an appropriate 
form, such as the squaring and multiplication by 1000 in equation [3] or we may 
want to add such steps as an additional reduction on the equation so that we 
avoid an unnecessarily complicated calculation, such as canceling the common 
factor 6 • 4 in equation [5] However, these “framing” steps are comparatively few 
so that it is worthwhile to coerce these calculations into the framework described 
here for even moderately large numbers, say n > 30. 

It is not really possible to prove Observation [L] as it is not a strictly math¬ 
ematical statement without additional explanation of what exactly is meant by 
“calculation” or “numerical nature”. Its primary purpose is in justifying the 
scope of the arithmetic system to be built, and it is justified by the observation 
that computers can compute the various constants and special functions using 
only a small fixed instruction set and an arithmetic logic unit (ALU) that sup¬ 
ports only these basic operations on bit strings. This observation can also be 
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viewed as a combination of the Church-Turing thesis and the observation that 
arithmetic on Q can be expressed in the language of Peano Arithmetic. 

Section [2] describes the metatheory of “decimal numbers” as they appear 
in set.mm, and the theorems which form the basic operations upon which the 
algorithm is built. Section [3] describes the Mathematica implementation of a 
limited-domain automated theorem prover (ATP) for arithmetic using the deci¬ 
mal theorems as a base, and Section [I] surveys the outcome of the project. 

2 Setting Up “Decimal” Arithmetic Theorems in set.mm 

The State of the Database Before this Project. 

Definition 1. When describing terms, we will use the notation ( x) to refer to 
the defined number term with value x, assuming 0 < x < 10, e.g. 3-3 is a literal 
expression “3 • 3” while (3 • 3) is the term 9. 

We take for granted the following facts, already derived in the database: 

— We have general theorems for the field operations, so that a + b = b + a, 
a-b = b-a, a + 0 = a, and a ■ 1 = a. We also have theorems of the form 
x = x; x = y b y = x; x = y,y = z b x = z available. 

— As we have already mentioned in Section [1] the numbers 0-10 have been 
defined, yielding definitional theorems of the form (n + 1) = n + 1 for each 
n £ {1,..., 9}, and for each such number n we have the theorems n £ INo 
and n £ IN (except n = 0 which only has 0 £ INo). 

— We have addition and multiplication facts for these numbers. That is, if 
1 < ra < to < 10 and m + n < 10 we have the theorem m + n = (m + n ), and 
similarly if 1 < n < m < 10 and mn < 10 then m-n = (mn ) is a theorem. By 
combining these facts with the general theorems on field operations we can 
construct the complete subsection of “addition table” and “multiplication 
table” expressible using numbers less than 10. (Recall that 6 + 5 = 11 is not 
a theorem because even though 6 + 5 is well-defined, “11” does not name a 
number and so the statement itself makes no sense unless 11 is first given a 
definition.) 

— We have general inequality theorems like transitivity, and theorems of the 
form m < n for each 0 < m < n < 10. 

This listing motivates the following definition: 

Definition 2. A basic fact is a theorem of the form x £ IN, x £ INo, x = y + z, 
x = y ■ z, or x < y where x, y, z are each number terms selected from 0,..., 10, 
which asserts a true statement about integers x,y,z. 

Theorem 2. There is an integer N such that every basic fact is provable in less 
than N steps. 

Proof. Immediate since there are only finitely many basic facts. □ 
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It is not relevant for the analysis how large N is, but by explicitly enumerating 
such facts one can show that IV < 10, or iV < 3 if closure steps x € INo.lR, C 
are ignored. 

We begin by defining the concept of “numeral”. 

Definition 3. A numeral is a term of the language of set.mm defined recur¬ 
sively by the following rules: 

— The terms 0,1, 2, 3 are numerals. 

— If n is a nonzero numeral and a £ {0,1, 2, 3} , then the term (4 ■ n) + a is a 
numeral. 


Why Quaternary? It is easily seen that this definition enumerates not base-10 
integer representations but rather base-4 representation. It is of course necessary 
to commit to a base to work in for practical purposes, and the most logical de¬ 
cision for a system such as set.mm which prioritizes abstract reasoning over 
numerical calculation is base 10. The reason this choice was rejected is because 
the “multiplication table” for base 10 would require many more basic facts like 
7-8 = 10-5 + 6, while the multiplication table of base 4 requires only numbers as 
large as 9, which fits inside our available collection of basic facts. Furthermore, 
within this constraint a large base allows for shorter representations, which di¬ 
rectly translates to shorter proofs of closure properties and other algorithms 
(in addition to increasing readability). The specific choice of 4 = 2 2 also works 
well with algorithms like exponentiation by squaring, used in the calculation of 
x k mod n in primality tests (see Section QJ. 

Remark 1. It is important to recognize that “n is a numeral” is not a statement 
of the object language, but rather of the metalanguage describing the actual 
structure of terms. Furthermore, these are terms for “concrete” integers, not 
variables over them, and there are even valid terms for nonnegative integers that 
are not equal to any numeral, such as if(CH, 1,0) (where CH is the Continuum 
Hypothesis or any other independent statement), which is provably a member of 
INo even though neither b if(CH, 1,0) = 0 nor b if(CH, 1,0) = 1 are theorems. 
However, we will show that these pathologies do not occur in the evaluation of 
multiplication, addition, and ordering on other numerals. 

There are three additional methods that are employed to shorten decimal 
representations, by adding clauses to Definition [3] We will call these “extended 
numerals”. 

— We can include the numbers 4,5,6,7,8,9,10 as numerals. Although it is 
disruptive to some of the algorithms to allow these in the lower digits (e.g. 
considering 4-3+7 to be a numeral), it is easy to convert such “non-standard” 
digits in the most significant place, via the (object language) theorems 

dec4. 


b (4 - 1) + 0 = 4 


b (4 • 2) + 2 = 10 


declO. 
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— We can drop 0 when it occurs in a numeral; this amounts to adding the rule 
“if n is a numeral then 4 • n is a numeral” to the definition of a numeral. The 
conversion from this form to the usual form is provided by the theorem 


b n £ INo 
b 4n + 0 = 4n 


decOu. 


— We can define a function (x : y) = Ax + y and add the rule “if £ is a 
numeral and y £ {0,1, 2, 3} then (x : y) is a numeral”. Since multiplication 
and addition have explicit grouping in set.mm, this halves the number of 
parentheses and improves readability, from ((4-((4-3)+2)) + l) to ((3 : 2) : 1). 
The conversion for this form is provided by the theorem 


b x £ INo by€ 1N 0 
b Ax + y = (x : y) 


decfv. 


In remark [l] we observed that not all nonnegative integers are numerals, but 
we can show that the converse is true, so that in the above theorems, we can use 
the antecedent n £ INo instead of “n is a numeral” which is not possible since 
this is not a statement of the object language. This weaker notion is sufficient 
to assure that n has all the general properties we can expect of numerals, like 
ro + 0 = n or n> 0, which is what we need for most of the theorems on numerals. 


Theorem 3. If n is a numeral, then b n £ INo- 

Proof. By induction. If n £ {0,1, 2,3}, then n £ INo is a basic fact. Otherwise, 
n = Am + a for some numeral to and a £ {0,1,2,3}, and by induction we 
can prove to £ INo and a £ INo is a basic fact. Then the result follows from 
application of the theorem 

b to £ IN 0 bag IN 0 

- decclc. 

b 4m + a £ INq 


□ 


Theorem 4. If n is a nonzero numeral, then bng IN. 


Proof. By induction; the theorems involved are 


b to £ IN 
b 4 to + 0 £ IN 


decnncl2 


b m £ INo b a £ IN 
b 4 to + a £ IN 


decnnclc. 


If n £ {1,2,3}, then n £ IN is a basic fact. Otherwise, n = 4to + a for some 
nonzero numeral to and a £ {0,1,2,3}, and by induction to £ IN. If a = 0, then 
by decnncl2 n £ IN; otherwise a £ IN is a basic fact and n £ IN follows from 

decnnclc. □ 


Theorem 5. The numeral representation is unique, in the sense that if to = n 
as integers, then to and n are identical terms. 
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Proof. Follows from the uniqueness of base-4 representation of integers. □ 


Remark 2. Note that each of the extended representations of numerals invalidate 
this theorem, e.g. (4 • 2) + 0 = 8 = 4 • 2 = (2 : 0) would all be distinct representa¬ 
tions of the same integer value whose standard form is (4 • 2) + 0. Nevertheless, 
there is still a weaker form of uniqueness which is preserved. If m, n are extended 
numerals and m = n as integers, then b m = n is provable; this follows from the 
respective “conversion” theorems for each extension together with the equality 
theorems for addition and multiplication (a = b,c = d\~ a+b = c+d , a-b = c-d). 

The converse of this, b m = n => m = n, follows from soundness of ZFC 
from our interpretation of terms in the object logic as integers with the usual 
operations in the metalogic. This is why we will often not distinguish between 
equality in the metalogic and the object logic, because they coincide with each 
other and with identity as terms from Theorem [5] 


Theorem 6. If n is a numeral then h 4a+ b = n for some numeral a (not 
necessarily nonzero) and some b £ {0,1,2,3}. 


Proof. If n £ {0,1,2, 3}, then we can use the theorem 


b a £ INo 
b 4 • 0 + a = a 


decOh, 


since n £ INo is a basic fact. Otherwise, n = 4a + b for some nonzero decimal a 
and b £ {0,1, 2, 3}, and the goal statement is just b 4a + b = 4a + b. □ 


Theorem 7. If m < n are numerals, then b m < n. 


Proof. By induction on to; the relevant theorems are 


b a £ IN b 6, c £ IN 0 
b c < 4a + b 


b c < 4 


declti, 


b a, 6 £ IN 0 b c £ IN bi<c 
b 4a + b < 4a + c 


declt, 


b a, b, c, d £ INo bfr<4 ba<c 
b 4a + b < 4c + d 


decltc. 


If to, ra £ {0,1,2,3}, then m < n is a basic fact. Otherwise if to £ {0,1, 2,3} 
and n = 4a + b, then since a is nonzero a £ IN by Theorem |4j and to < 4 is 
a basic fact, so declti applies to give b to < n. If to = 4a + b, then to > 4 so 
n = 4c + d (because m < n implies n {0,1, 2, 3}) for nonzero numerals a, c and 
b,d £ {0,1, 2, 3}. If a = c, then by Theorem^ a and c are identical, so (working 
in the metalogic) 4a + b < 4a + d —> b < d, and since b,d £ {0,1, 2,3} this is a 
basic fact; thus declt applies and b to < n. Again working in the metalogic, if 
a > c then 4a + b > 4a > 4c + 4 > 4c + d in contradiction to the assumption 
to < n, so in the other case a < c and by the induction hypothesis b a < c; and 
b < 4 is a basic fact, hence decltc applies. □ 
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As an example of Theorem|71 we know that b 3 < 4-3+1 and b 4-2+0 < 4-2+1 
are theorems of set .mm because 3 < 13 and 8 < 9, respectively (and the numeral 
representations of these numbers are [3] = 3, [13] =4-3 + 1 and so on). 

Next we show how to do successors, addition, and multiplication. 

Definition 4. As a variant of Definition QJ we use the notation [ir] to denote 
the unique numeral corresponding to the expression x. Thus for example [6 + 5] = 
(4-2)+ 3. 

Remark 3. Note that for 0 < x < 10, the statement [x] = (x) is either an identity 
or one of dec4, dec5, ..., declO, and so is provable in one step. 


Theorem 8. If n is a numeral and b n = n', then b [n + 1] = n' + 1. 


Proof. By induction on n; the relevant theorems are 

b a,6 £ INo bc=(6+l) b4 a + b=n 
b4a + c = n + l 


decsuc, 


b a £ JMo 


b 6 = (a + 1) b 4a + 3 = ro 
b46 + 0 = n + l 


decsucc2. 


If n £ {0,1, 2}, then [n + 1] € {1, 2,3} and [n + 1] = n + 1 is a basic fact, and 
bn+1 = «' + 1. Otherwise, by Theorem [5] n = 4a + b for some (not necessarily 
nonzero) numeral a and b £ {0,1, 2, 3}. If 6 € {0,1, 2}, then [n +1] = 4a+ [6+1], 
and so the assumptions [6+1] = b + 1, 4a + b = n' for decsuc are satisfied. 
Otherwise 6 = 3 and [n + 1] = 4 [a + 1] + 0, and by the induction hypothesis 
b [a + 1] = a + 1 (using n, v! >->■ a). Then [a + 1] = a + 1 and 4a + 3 = n! satisfy 
the assumptions of decsucc2. □ 


Remark f. The extra assumption b n = n! is not necessary (i.e. we could just 
have proven b [n +1] = n + 1) but makes it a little easier to work with extended 
numerals, because that way n can be the standard numeral while n' is the 
extended numeral, and the assumption is satisfied by remark [2] 

To give an example, if we were able to prove (using other theorems than 
discussed here) that b4-l + l = 6 — 1, then Theorem [5] says, taking n = [5] = 
4-1 + 1 and n' = 6—1, that b4-l + 2 = (6 — 1) + 1 is also provable. 

Theorem 9. If m,n are numerals such that b m = m' and b n = n ', then 
b [m + n\ = m! + n'. 

Proof. By induction on m + n; the relevant theorems are 


b a, 6, c, d £ INo b4a + 6 = m b 4c + d = rt 
be=a+c b / = 6 + d 

b4e+/ = 77T + 7T 


decadd, 


b a, 6, c, d, / £ INo b 4a+ 6 = m b4c + d = n 
b e = (a + c) + 1 b4+/=6+d 


b4e + / = m + n 


decaddc. 
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If m + n £ {0,1,2,3}, then [m + n] = m + n is a basic fact so b [m + n\ = m' + n! 
follows from properties of equality. By TheoremOwe can promote m and n to the 
form m = 4 a+b, n = 4 c+d where b, d < 4 and a, c are numerals. Now if b + d < 4, 
then b b + d < 4 follows from the basic facts [b + d] = b + d and [6 + d] < 4, and 
so filling the assumptions with 4 a + b = m', 4c + d = n', and [b + d] = b + d and 
using the induction hypothesis to prove [a + c] = a + c, we can apply decadd. 
Otherwise, b + d > 4, so [b + d — 4] £ (0,1, 2, 3} and we can use 4 a + b = m', 
4c+d = n', prove 4 + [b+d — 4] = b+d using the basic facts {b+d) = 4+[b+d— 4] 
and {b + d) = b + d, and prove [a + c + 1] = [a + c] + 1 = (a + c) + 1 using first 
Theorem [5] and then the induction hypothesis; this completes the assumptions 
of decaddc. □ 


The next theorem works by double induction, so it is easier to split it into 
two parts. 

Theorem 10. If m,n are numerals, p £ {0,1, 2, 3}, and b m = m',n = n', 
then {mp + n] = m'p + n'. 

Proof. By induction on m , using the theorem 


b a, b, c, d, p, f, g £ INo b4a + & = m b4c + d = n 
b e = ap + {c + g) b 4 g + f = bp + d 
b 4 e + f — mp + n 


decmac. 


If m £ {0,1,2,3}, then [mp] = {mp) is provable by remark [3] and {mp) = mp 
is a basic fact, so b [mp + n\ = [mp] + n = m'p + n! by Theorem [9] Otherwise 
let m = 4a + b, and write n = 4c + d and [bp + d\ = Ag + f for some c, d, /, g 
by Theorem [6] Then the assumptions to decmac are satisfied by 4 a + b = m', 
4c + d = n', [ap + c + g] = ap+[c + g] = ap+ (c + g) by the induction hypothesis 
and Theorem [9] and 4 g + f = [bp + d\ = [bp] + d = bp + d by Theorem [51 □ 


Theorem 11. If m,n,p are numerals, and b m = m’,n = n',p = p’, then 
[mp + n] = m'p 1 + n!. 


Proof. By induction on p , using the theorem 


b a , b , c, d, m , f,g£ INo b 4 a + b = p b 4 c + d = n 

b e = ma + (c + g) b 4 g + f = mb + d 
b 4 e + f = mp + n 


decma2c. 


If p £ {0,1, 2,3}, then [ mp+n ] = m'p+n 1 = m'p'+n 1 by Theorem [TUI Otherwise 
let m = 4a + b, and write n = 4c + d and [mb + d] =4g + f for some c, d , /, g 
by Theorem [6] Then the assumptions to decma2c are satisfied by 4 a + b = p ', 
4 c+d = n', [ma+c+g] = ma+[c+g] = ma+(c+g) by the induction hypothesis 
and Theorem [5J and 4 g + f = [mb + d] = mb + d by Theorem [TOj □ 


Corollary 1. If m,n are numerals, and b m = m',n = n’, then [mn] = m!n!. 

Proof. Theorem [TTI gives [mn] = m'n' + 0, and m'n' + 0 = m'n' because m'n' = 
mn £ INo. (Slightly more efficient than this approach is an application of the 
theorems decmullc, decmul2c which are essentially the same as decmac, decma2c 
without the addition component.) □ 
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3 A Mathematica Implementation of the Decimal 
Arithmetic Algorithm 

The purpose of the preceding section was not merely to prove that arithmetic 
operations are possible, which could be done just as easily using finite sums or 
unary representation. Rather, by proving the results contructively it is in effect a 
description of an algorithm for performing arithmetic calculations, and as such 
it is not difficult to implement on a computer. Due to its advanced pattern¬ 
matching capabilities, Mathematica was selected as the language of choice for 
the implementation. 

In order to avoid Mathematica’s automatic reduction of arithmetic expres¬ 
sions, we represent the Metamath formulas of our limited domain via the follow¬ 
ing correspondence: 

— An integer n is represented as itself 

— The term a + b becomes pi [a, b] 

— The term a ■ b becomes tm[a, b] 

— b a = b becomes eq[a, b] 

— b a < b becomes It [a, b} 

— b a £ IN becomes elN[a] 

— b a € INo becomes elN0[a] 

— b a £ € becomes elC[a] 

These symbols have no evaluation semantics and so simply serve to store the 
shape of the target expression. 

The output proof is stored as a tree of list expressions by the following 
correspondence: 

If the expression e is obtained by applying the theorem t to the list of 
expressions e \,..., e*, with proofs p\,... ,pk, then the proof of e is stored 
as the expression p = {{pi,... ,Pk}, e, t}. 

For example, the expression 4 • (4 • 1 + 3) + 2 = 5 • 6 is represented as 

eq [pi [tm [4, pi [tm [4,1] , 3] ] , 2] , tm [5,6] ] 

and the proof of 2 • (4 ■ 1 + 1) £ INo (by the sequence of theorems: 2 £ INo by 
2nn0, 1 £ INo by InnO, 4 • 1 + 1 £ INo by decclc, 2 • (4 • 1 + 1) £ INo by nnOmuldi) 
is: 

{{{{>, elNO [2], "2nn0"}, 

{{{{>, elNO[1], "InnO"}, 

{{}, elNO[1], "InnO"}}, 
elNO[pi[tm[4,1],1]], "decclc"}}, 
elNO[tm[2,pl[tm[4,1],1]]], "nnOmulcli"} 

(There are more efficient storage mechanisms, but this one is relatively easy 
to take apart and reorganize. Furthermore, since it only needs to run once in 
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order to produce the proof, we are much more concerned with the length of the 
output proof than the speed of the proof generation itself.) For intermediate 
steps, we will also have use for the “proof stubs” {Null,e,"?"} (representing a 
proof with goal expression e that has not been completed) and {$Failed, e, "?"} 

(for a step that is impossible to prove or lies outside the domain of the prover). 

Given a term expression, we can evaluate it easily using a pattern-matching 
function: 

eval[pl[a_, b_]] := eval[a] + eval[b] 
eval[tm[a_, b_]] := eval[a] eval[b] 
eval[n_Integer] := n 

Then the domain of our theorem prover will be expressions of the form eq [m, n] 
where m and n are terms built from the integers 0-10 and pi, tm which sat¬ 
isfy eval [to] == eval [n]. (As a side effect we will also support elNO [n], and 
elN[n] when eval [n] != 0.) We will also need the reverse conversion: 

bb[n_] := If[n < 4, n, pl[tm[4, bb[Quotient[n, 4]]], Mod[n, 4]]] 

This is the equivalent of the [a:] function from Definition U 

Our algorithm works in reverse from a given goal expression, breaking it 
down into smaller pieces until we reach the basic facts. The easiest type of proof 
is closure in INo: 

prove[elNO[n_Integer]] := 

With[{s = ToString[n]}, 

If[n <= 4, {{}, elNO[n], s <> "nnO"}, {{prove[elN[n]]}, elN0[n], 
"nnnnOi"}]] 

prove[x : elNO[pi[tm[4, a_], b_]]] := {{proveOelNO[a], proveOelNO[b]}, 

x, "decclc"} 

prove[x : elN0[tm[a_, b_]]] := {{proveOelNO[a], proveOelNO[b]}, x, 
"nnOmulcli"} 

prove[x : elNO[pi[a_, b_]]] := {{proveOelNO[a], proveOelNO[b]}, x, 
"nnOaddcli"} 

prove[elN[n_Integer]] /; n > 0 := {{}, elN[n], ToString[n] <> "nn"} 
prove[elN[x : pi[y : tm[4, a_], b_Integer]]] := 

If [b === 0, {{{proveOelNO[a], eq[x, y], "decOu"}, proveOelN[y]}, 
elN[x], "eqeltri"}, {{proveOelNO[a], proveOelN [b]}, elN[x], 
"decnnclc"}] 

prove[elN[x : tm[4, a_]]] := {{proveOelN[a]}, elN[x], "decnncl"} 

This simply breaks up an expression according to its head and applies decclc 
to numerals, nnOaddcli and nnOmulcli to integer addition and multiplication, and 
theorems OnnO, InnO, ..., 4nn0, 5nn, ..., lOnn to the integers 0-10 (where we 
switch to IN closure for numbers larger than 4 because we do not have INo closure 
theorems prepared for these). Similarly, we can do closure for IN: 
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prove[elN[n_Integer]] /; n > 0 := {{}, elN[n], ToString[n] <> "nil"} 
prove[elN[x : pi[y : tm[4, a_], b_Integer]]] := 

If[b === 0, {{proveOelN[a]}, elN[x], 

"decnncl2"}, {{proveOelNO[a], proveOelN[b]}, elN[x], "decnnclc"}] 
prove[elN[x : tm[4, a_]]] := {{proveOelN[a]}, elN[x], "decnncl"} 

This is just an implementation of Theorem 0] 

In order to do arbitrary equalities, we first ensure that both arguments are 
numerals, by chaining x = [a;] < [y\ = y for inequalities and x = [x\ = y for 
equalities (where [x] and [y] are identical since we are assuming that the equality 
we are proving is in fact correct). We also allow the case “x < 4” where x is 
a numeral even though 4 is not a numeral, because it comes up often and we 
already have theorems for this case. 

prove[It [x_, y_]] /; eval[x] < eval[y] : = 

If[x === bbOeval [x], 

If [y === bbOeval[y] I I y === 4, 

provelt[x, y], {{proveOlt[x, bbOeval [y]], proveeq2[y]}, lt[x, y], 
"breqtri"}] , {{proveeq2 [x] , proveOlt [bbOeval [x] , y] } , lt[x, y] , 
"eqbrtrri"}] 

prove[eq[x_, y_]] /; eval[x] == eval[y] := 

If[x === bbOeval[x], 

proveeq2[y], {{proveeq2[x], proveeq2[y]}, eq[x, y], "eqtr3i"}] 
proveeq2 [x_] := proveeq[bbOeval[x], x] 

It remains to define provelt and proveeq. We start with provelt, imple¬ 
menting Theorem [7) 

provelt[x_Integer, y_Integer] := {{}, lt[x, y], 

ToString[x] <> "It" <> ToString[y]} 
provelt[x_Integer, 

y : pl[tm[4, a_], b_]] := {{proveOelN[a], proveOelNO[b], 
proveOelNO [x], provelt[x, 4]}, lt[x, y], "declti"} 
provelt [x : pl[tm[4, a_] , c_] , y : pl[tm[4, b_] , d_] ] : = 

If[a === b, {{proveOelNO[a], proveOelNO [c], proveOelN[d], 
provelt[c, d]}, lt[x, y], "declt"}, 

{{proveOelNO[a], proveOelNO [b], proveOelNO [c], 

proveOelNO[d], provelt[c, 4], provelt[a, b]}, lt[x, y], "decltc"}] 

The contract of proveeq[a;, y] is such that it returns a proof of b x = y given 
an expression y, assuming x = [y] (since we can calculate x from y, the left 
argument is not necessary, as in the variant proveeq2, but it simplifies pattern 
matching), and it is defined much the same as previous functions; we elide it 
here due to space constraints. 

Basic facts. One fine point which may need addressing, since it was largely 
glossed over in Theorem [2j is the algorithm for basic facts. We define a function 
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basiceqfxj which is valid when x is either pi [m,n\ or tm [m,n] and eval [x] < 
10; in this case it corresponds to a “basic fact” of either addition or multiplica¬ 
tion, and the return value is a proof of eqfeval [x] , x]. There are many special 
cases, but every such statement follows from addidli, addid2i (addition with 0), 
mulidli, mulid2i (multiplication by 1), mulOli, mul02i (multiplication by 0), df-2, 
..., df-10 (addition with 1), or a named theorem like 3p2e6 for 3-2 = 6 these ex¬ 
ist for each valid triple containing numbers larger than 1 - possibly followed with 
addcomi, mulcomi (commutation) and/or eqcomi, eqtri (symmetry/transitivity of 
equality) to tie the components together. 

4 Results 

In this section we will describe the purpose to which we applied the arithmetic 
algorithm. 


4.1 Prime Numbers 


There are several ways to prove that a number is prime, and the relative effi¬ 
ciency can depend a lot on the size of the numbers involved. For small numbers, 
especially if it is necessary to find all primes below a cutoff, the most efficient 
method is simple trial division. The biggest improvement in efficiency here is 
gained by looking only at primes less than \Jn in a proof that n is prime. Start¬ 
ing from the two primes 2,3 which are proven “from first principles”, we can use 
this to show that a number less than 25 which is not divisible by 2 or 3 is prime, 
and we reduce these primality/compositeness deductions to integer statements 


b q G 1N 0 ha,relN b b = aq + r b r<a , 

---—- ndvdsi 

b a f b 

b a, 6 G IN b 1 < o b 1 < 6 b n = ab 

-;- 7 ”T—- nprmi. 

bn^IP 

As a consequence of our choice of base 4, we also have easy proofs of composite¬ 
ness or non-divisibility by 2: 


b r < a 


ndvdsi 


b a G INo 
b 2 14a + 1 


dec2dvdsl 


b a G INo 
b 2 14a + 3 


dec2dvds3 


b a G IN 
b 4a + 2 </ P 


dec2nprm 


This yields proofs of 5, 7,11,13,17,19, 23 G P. We repeat the process to show 
that a number less than 29 2 = 841 and which is not divisible by 2, 3,5, 7,11,13,17, 
19, 23 is prime (the upper bound is 29 2 because 29 is the first prime larger than 
23), and we use this theorem to prove primality of 37,43,83,139,163,317,631. 
There are three more primes needed for the prime sequence in Bertrand’s postu¬ 
late, namely 1259,2503,4001, and we would need many more primes to repeat 
the process with a still-larger upper bound, so we switch methods. 
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Pocklington’s Theorem. 

Theorem 12. If TV > 1 is an integer such that TV — 1 = AB with A > B, and 
for every prime factor p of A there is an a p such that a^ -1 = 1 (mod TV) and 

gcd(ap Ar_1 ^ p — 1,7V) = 1, then TV is prime. 

This theorem has been proven in set.mm in decimal-friendly form (in the 
case when A = p e is a prime power) as 


hpSP \-g, B,e,a&TN b m = gp b TV = m + 1 b m = Bp e 
b B <p e b a m = 1 (mod TV) b gcd(a® — 1,7V) = 1 
b TV G P 


pockthi. 


Ignoring the p e term which is easily evaluated by writing it out as a product of 
p’s since p e < TV is relatively small, there are two new kinds of integer statements 
involved here, a = b (mod TV) and gcd(c, d) = e. We can further narrow our 
concern to statements of the form a m = b (mod TV) and gcd(c, d) = 1 (where 
c > d), and we can reduce the second via Euclid’s algorithm in the form 


b k, r, n £ INq 


b m = kn + r b gcd(n, r) = g 
b gcd(m, n) = g 


gcdi. 


For the power mod calculations, we used addition-chain exponentiation on man¬ 
ually selected chains which had good calculational properties (small intermediate 
calculations), since the hardest step in the calculation 


b e = b + c b dn + m = kl b a b = k, a c = l (mod n) 
b a e = m (mod n) 

is evaluating the expression dn + m = kl, whose proof length is driven by the 
size of k,l , so that exponents with smaller reduced forms yield shorter proofs. 
In the worst case, for TV = 4001 we are calculating products on the order kl < 
TV 2 = 16 008 001, which are roughly 12-digit numbers in base 4. 



4.2 Bertrand’s Postulate Gets the Last Laugh 

As mentioned in Section 11.11 the framework described in this paper was devel¬ 
oped in preparation for performing the large calculations needed to prove that 
numbers like 4001 are prime. After this work was completed, we discovered that 
there was a new proof by Shigenori Tochiori [5] (unfortunately untranslated to 
my knowledge) which, by strengthening the estimates in Erdos’ proof, manages 
to prove the asymptotic part for n > 64 (instead of n > 4000), so that the ex¬ 
plicit enumeration of primes above this became unnecessary for the completion 
of the proof. Nevertheless, the proof of 4001 £ IP remains as a good example 
of a complicated arithmetic proof, and we expect that this arithmetic system 
will make it much easier to handle such problems in the future, and bpos is now 
completed in any case. 

2 closure assumptions elided 
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4.3 Large Proofs 

One recent formal proof which has gathered some attention is Thomas Hales’ 
proof of the Kepler Conjecture, also known as the Flyspeck project [6], and it 
highlights one foundational issue regarding Metamath’s prospects in the QED 
vision of the future [7] - which and how much resources are stressed by projects 
like this with a large computational component? The design of Metamath is 
such that it can verify a proof in nearly linear time, assuming that it can store 
the entire database of theorem statements in memory, because at each step it 
need only verify that the substitutions to the theorem statement are in fact done 
correctly (and the substitutions themselves are stored as part of the proof, even 
though they can be automatically derived with more sophisticated and slower 
algorithms). Thus the primary bottleneck in verification time is the length of 
the proof itself, and we can analyze this quite easily for our chosen algorithm. 

Since we are essentially employing grade-school addition and multiplication 
algorithms, it is easy to see that they are 0(n ) and 0(n 2 ) respectively, and with 
more advanced multiplication algorithms we could lower that to 0(?z 15 ) or lower. 
Indeed, there does not seem to be any essential difference between the number 
of steps in a Metamath proof and the number of cycles that a computer might 
go through to perform the equivalent algorithm, even though a Metamath proof 
doesn’t “run” per se as it is a proof and not a program. (In fact a Metamath 
proof has at least one big advantage over a computer in that it can “guess the 
right answer” in the manner of a non-deterministic Turing machine.) To take 
this example to its conceptual extreme, we could even simulate an ALU with 
addition and multiplication of integers representing data values of a computer, 
and then the progress of the proof would directly correspond to the steps in a 
computer program. Of course this would introduce a ridiculously large constant, 
but it would suggest that any program, including a verifier for another formal 
system such as the HOL Light system in which Flyspeck runs, can be emulated 
with a proof whose length is comparable to the running time of the verifier 
without a change in the overall asymptotics. 
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